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Article history: Accurate interpretation of chest radiographs outcome in epidemiological 
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interpretation of chest radiographs. This variability often leads to wrong 
diagnosis due to the fact that chest diseases often have common symptoms. 
Keywords: Moreover, there is no single reliable test that can identify the symptoms of 
pneumonia. Therefore, this paper presents a standardized approach using 
convolutional neural network (CNN) and transfer learning technique for 
identifying pneumonia from chest radiographs that ensure accurate diagnosis 





Chest radiograph 
Deep learning 


Diagnosis and assist physicians in making precise prescriptions for the treatment of 
Neural network pneumonia. A training set consisting of 5,232 optical coherence tomography 
Pneumonia and chest X-ray images dataset from Mendelev public database was used for 


this research and the performance evaluation of the model developed on the 
test set yielded 88.14% accuracy, 90% precision, 85% recall and F1 score of 
0.87. 
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1. INTRODUCTION 

Pneumonia, a pulmonary disease, in which air sacs in the lungs, also referred to as alveoli, are filled 
up with fluid such as pus [1]. It is a pulmonary infection occasioned by virus or bacteria, resulting in the 
death of approximately 1.4 million children yearly. By implication, this statistic indicates that about 18% of 
children born die at less than five years of age. Globally, nearly 156 million children are currently suffering 
from the attack of pneumonia [2]. Findings revealed a great burden of communicable diseases in the world in 
which about 30% of world childhood deaths are caused by acute respiratory infection [3]. Unlike other parts 
of the human body, the difficulty associated with accessing the chest region makes the diagnosis of common 
chest ailments very challenging to medical practitioners [4], [5]. To reduce the mortality rate caused by chest 
region diseases such as pneumonia, the World Health Organization (WHO) established a child health 
epidemiology grouped (CHERG) in the year 2001. CHERG was saddled with the responsibility of carrying 
out a systematic review and data collection improvement, methods, and assumptions, underlying the 
estimates of death’s causes distribution in children for year 2000 [6]. 
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2. REVIEW OF LITERATURE 

Chest radiograph, also known as chest X-ray (CXR), is among periodically performed radiological 
procedures that use little dose of ionizing radiation to capture images of the interior of human chest, lungs, 
and heart [7]. It is useful in diagnosing, monitoring, and treating diverse lung conditions such as cancer, 
pneumonia, and tuberculosis [8]. Radiological result has been a major means of diagnosing pneumonia but 
the major problem with the approach is the lack of uniformity in chest radiograph’s interpretation [9], [10] 
and hence a standard approach is required since there is no single reliable test that can identify the symptoms 
of pneumonia. Medical imaging means techniques and procedures used in creating images of human body 
parts such as radiography, Magnetic resonance imaging, ultrasound, and endoscopy [11]. Computers can be 
leveraged in the analyzing medical images to gain a better understanding and interpretation of medical 
images [12], by leveraging the hierarchical feature representation learned from data instead of the common 
hand-made features that are mostly designed based on domain-specific knowledge [13]. Deep learning 
incorporates feature engineering into the learning step in its learning analysis [14], and therefore requires 
only a dataset with little pre-processing where informative representations are discovered in a self-learning 
manner [15], [16]. One of the popular recent applications is AlphaGo and AlphaZero, developed by 
DeepMind [17]. Deep learning is also used in object detection to detect the position of an object in image. 
This application is useful to detect early symptoms of abnormality present in patients. Furthermore, it is used 
in image segmentation for finding anatomical structures that are present in an image. 

Deep learning has received significant attention due to its ability to process a huge number features 
when dealing with unstructured data as could be found in [18], [19]. It was implemented in [20], [21] for the 
detection and localization of abnormalities in chest radiographs with huge success. At the center of deep 
learning is artificial neural networks (ANNs) models manually extract features from raw data or features 
learned by other simple models. This enables systems to authomatically learn useful representation and 
features from raw data, without the tedious manual procedure. Its choice in medical image analysis is mostly 
triggered by convolutional neural networks (CNNs) [22]-[24], which is good at learning useful representation 
of image data, and other data structured. To sufficiently use CNNs, features has to be typically designed by 
hand, and can identify features that are relevant in a dataset without human interaction [25], [26] which make 
it practicable to utilize features learned directly from data [27], [28]. While ANN need much data to learn the 
patterns and associations in data, deep learning does not. The diagnosis of diseases of the chest using 
radiographs have aroused research interests and has been deployed for the diagnosis of lung nodule [29] and 
the classification of lung tuberculosis [30]. Using open datasets, many convolutional models are based on 
several abnormal features [31] which revealed that the same CNN does not replicate performance on every 
abnormal feature. Accuracy is improved when the comparison is made between deep learning techniques and 
rule based techniques. Dependency based on statistics between labels was implemented to get better and 
accurate predictions, resulting in better performance than other methods implemented on 13 images that were 
selected out of 14 classes [32]. Mining algorithms and labels prediction arose from radiographs including their 
report have been researched [33]-[35], the labels of the radiographs were all limited to radiographs that have 
disease labels which resulted in a lack of contextual facts. Radiography detection of diseases was studied 
in [36]-[38], and reported categorization based on image views from radiographs was reported in [39] while 
isolation of body parts from chest radiographs plus computed tomography was implemented in [40]. Inception 
v3 is a known model that can be leveraged to achieve very high accuracy in image recognition [41] as applied in 
Bar et al. [42] with encouraging results and used in this paper because it requires few computing resources. 


3. METHOD 

The data used in this paper was obtained from optical coherence tomography, and X-ray images of 
chest from the Mendeley public database [43]. As presented in Figure 1, the training set used is 5,232 images 
out of 5856 chest X-ray images collected from children. Out of these, 3883 X-ray images belong to 
patients/children diagnosed with pneumonia while 1349 X-ray images belong to children that are free from 
pneumonia. The validation set were 16 images while the test set were 624 images. Labels were given to the 
images as it is done in supervised learning. A model was created using Inception-V3 transfer learning on 
Tensorflow, which was trained using 5232 images out of which 3883 were from pneumonia children and 
1349 were from normal children. The trained model was tested with 624 images out of which 390 contains 
pneumonia and 234 were from normal children. 

The research design consists of steps implemented on Inception-v3 CNN as indicated in Figure 2. 
The first stage constitutes the system architecture. In the second stage, the images are read into the system, 
while the third stage involves pre-processing the input image. The input image was irregular in size and 
cannot pass through the learning algorithm, which expects the input images to be of size 224x224. 
Using bilinear interpolation, the images were resized to the required dimension. The images were represented 
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as array of pixel values ranging from intensity level 0 to 255. In order to ensure that the data is suitable for 
learning, the pixels were scaled down by (1). 


P' = (1) 


Where p is the original value of pixel, and P’ the new value of pixel within the range 0 to 1. 
The pre-processing tasks include image resize to ensure uniform dimension for the images. The images were 
then scaled to within range 0 to 1 for each pixel. A data generator object was used to deliver the images in 
batches of 64 images each. The next step is training the system before finally generating the model. 
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Figure 1. Workflow diagram 
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Figure 2. The basic architecture of Inception-V3 











4. TRAINING THE NETWORK 

Transfer learning was carried out from a pre-trained base model (Inception-V3). It is a publicly 
available model trained on the ImageNet database of 14 million annotated images classified into 1000 
categories of objects. It is a deep CNN architecture that was trained for detection and classification based on the 
imagenet large-scale visual recognition challenge 2014 (ILSVRC14). The network’s architecture was specially 
designed to optimally utilized the computing resources. The network has 27 layers deep including 5 max 
pooling layers as shown in Table 1. In order to adapt this architecture to the objective of diagnosing 
pneumonia from X-ray images, a global average pooling layer and a new dense layer were added to the end 
of the network. A new two class output layer replaced the softmax 1000-class output layer. 
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Table 1. Layers of the network 








Type Patch size Output size Depth #1xl #3x3 #5x5_ Pool proj. 

Convolution 7x7/2 112x112x64 1 

Max pool 3x3/2 56x56x64 0 

Convolution 3x3/1 56x56x192 2 192 

Max pool 3x3/2 28x28x192 0 

Inception(3a) 28x28x256 2 64 128 32 32 
Inception(3b) 28x28x480 2 128 192 96 64 
Max pool 3x3/2 14x14x480 0 

Inception(4a) 14x14x512 2 192 208 48 64 
Inception(4b) 14x14x512 2 160 224 64 64 
Inception(4c) 14x14x512 2 128 256 64 64 
Inception(4d) 14x14x528 2 112 288 64 64 
Inception(4e) 14x14x832 2 256 320 128 128 
Max pool 3x3/2 7x7x832 0 

Inception(5a) 7x7x832 2 256 320 128 128 
Inception(5b) 7x7x1024 2 384 384 128 128 
Avg pool 7x7/1 1x1x1024 0 

Dropout (40%) 1x1x1024 0 

Linear 1x1x1000 1 

Softmax 1x1x1000 0 





5. MODEL EVALUATION METRICS 
The performance was based of the following metrics: 
— Classification accuracy: this is the ratio of the correctly classified images to the total number of image 
samples, presented in (2). 
CCI 
IN” 100 (2) 
Where, CCI = correctly classified images and TNI = total number of images. 
— Precision: this is the number of images correctly classified as having pneumonia against the number of 
images classified as having pneumonia multiplied by wrongly classified as having pneumonia. This is 
presented in (3). 


CDP 
CDPXWDP 


(3) 


Where, CDP = correctly diagnosed pneumonia and WDP = wrongly diagnosed pneumonia. 

— Recall (sensitivity): this is the ratio of number of images correctly classified as having pneumonia to the 
total number of images that actually have pneumonia multiplied by number of images wrongly 
classified as having pneumonia. This is presented in (4). 


CDP 
CDPXWDAP 


(4) 


Where, CDP = correctly diagnosed pneumonia and WDAP = wrongly diagnosed as pneumonia. 
— FI score: it is the weighted average of recall and precision. This measure shows the balance between 
precision and recall. This is presented in (5). 


2xprecisionxrecall 


(5) 


2+precision+recall 


6. RESULT AND DISCUSSION 

The implementation of this work was done in python programming language in a python notebook 
environment. Training was done in the '‘train.py' script, evaluation and generation of reports in the 
‘evaluate.py' script, and prediction in the 'predict.py' script. The dataset used for this project work are of two 
types. The first type contains radiography images of children that are suffering from pneumonia. The second 
dataset contains the radiography image of children that are normal children. Samples of the radiography 
images of normal children used are presented in Figure 3, while images for children with pneumonia are 
presented in Figure 4. The model was trained for 10 epochs. The system achieved a training accuracy of 
95.66% with a loss of 0.1135 and validation accuracy of 93.75% with a loss of 0.0854. 
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Figure 3. Radiographic image of normal children Figure 4. Children with pneumonia 


7. EVALUATION OF THE MODEL 

The metrics used for the model evaluation includes accuracy, precision, recall, and F1 score. From 
the results obtained, evaluation of the model on the test set yielded 88.14% accuracy, 90% precision, 85% 
recall and F1 score of 0.87 as presented in Figure 5. This is supported by the Confusion matrix presented in 
Table 2, where it is clear that out of 624 cases, presence of pneumonia was predicted 390 times and normal 
was predicted 234 times. 


[49] score = model.evaluate(test_data_gen, verbose=1) 


O 63/63 [====z=2z22222222z22222z2 2222 ] - 7s 111ms/step - loss: 0.3678 - accuracy: @.8814 


D precision recall fi-score 





Figure 5. Model evaluation results 


8. THE CONFUSION MATRIX 

The performance of a classifier on a set of test data from which the true values are known is often 
described using a confusion matrix. The confusion matrix in Table 2 shows the actual classes and the 
predicted classes for the cases in the test set in this work. From the table, the total predicted as normal is 234. 
The total predicted as pneumonia is 390. The total for actual normal is 180 while the actual pneumonia is 444. 


Table 2. The confusion matrix 








N = 624 Actual: normal Actual: pneumonia _ Total 
Predicted: normal 170 64 234 
Predicted: pneumonia 10 380 390 
Total 180 444 





9. CONCLUSION 

It can be concluded that using a pre-trained model reduces training time and yields better 
performance in the detection of pneumonia in chest radiographs. This further shows that deep neural 
networks with little data can be trained to achieve better recognition rate. Evaluation of the model developed 
on the test set yielded 88.14% accuracy, 90% precision, 85% recall and F1 score of 0.87. The model is very 
fast and can be used in medical department for analysis of chest radiographs for pneumonia detection. 
The model accuracy of 88.14% can still be improved upon by further training the network in order to 
improve its classification rate. 
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